The hubVis package contains a function called
plot_step_ahead_model_output() that can be used to plot
model output that is in the format of forecasts or projects that look
multiple horizons into the future.
This function plots forecasts/scenario projections and optional truth data. Faceted plots can be created for multiple scenarios, locations, forecast dates, models, etc. Currently, the function can plot only quantile data, with the possibility to add “median” information from the model projections.
For more information about the Hubverse standard format, please refer to the HubDocs website.
The following vignette describes the principal usage of the
plot_step_ahead_model_output() function.
Plots are available in two output formats:
interactive parameter to FALSE. See
end of the document for examples.The package contains two datasets that will be used for the following examples:
example_round1.csv: example of model output for a
round associated with the origin date: “2021-03-07” (called “round 1”),
target: “incident case”, for the US national level, from the example-complex-scenario-hub.
The data set also contains an ensemble calculated by applying the
function:
hubEnsembles::simple_ensemble(df_round1, agg_fun = "median")
truth_data.csv: example of target data from the example-complex-scenario-hub.
The data here comes from the "target-data/US_inc_case.csv"
file.
projection_path <- system.file("example_round1.csv", package = "hubVis")
projection_data <- read.csv(projection_path, stringsAsFactors = FALSE)
projection_data <- as_model_out_tbl(projection_data)
head(projection_data)
#> # A tibble: 6 × 9
#> model_id origin_date scenario_id location target horizon output_type
#> <chr> <chr> <chr> <chr> <chr> <int> <chr>
#> 1 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 2 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 3 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 4 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 5 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 6 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> # ℹ 2 more variables: output_type_id <dbl>, value <dbl>
truth_path <- system.file("truth_data.csv", package = "hubVis")
truth_data <- read.csv(truth_path, stringsAsFactors = FALSE)
head(truth_data)
#> time_idx location value target
#> 1 2020-01-25 02 0 inc case
#> 2 2020-01-25 01 0 inc case
#> 3 2020-01-25 05 0 inc case
#> 4 2020-01-25 04 0 inc case
#> 5 2020-01-25 06 0 inc case
#> 6 2020-01-25 08 0 inc caseThe model output data in the projection_data object
follows the structure of the
model_out_tbl class. This dataset is converted to a
model_out_tbl object after being read-in above. In addition
to the standard requirements for this class, the
plot_step_ahead_model_output() function in
hubVis requires that the dataset have a column whose value
corresponds to the variable that should be used for the x-axis of a
“step ahead” plot. In general, this should be a date variable that
corresponds to the date which is the “target” of a particular
prediction. By default it will look for the "target_date"
column, although this could be over-ridden by specifying a different
column using the x_col_name argument. In our example data,
this column does not exist, so we add it below:
projection_data <- dplyr::mutate(
projection_data, target_date = as.Date(origin_date) + (horizon * 7) - 1)
head(projection_data)
#> # A tibble: 6 × 10
#> model_id origin_date scenario_id location target horizon output_type
#> <chr> <chr> <chr> <chr> <chr> <int> <chr>
#> 1 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 2 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 3 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 4 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 5 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 6 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> # ℹ 3 more variables: output_type_id <dbl>, value <dbl>, target_date <date>The plotting function requires only 2 parameters:
model_output_data: a
model_out_tbl object containing all the Hubverse
standard columns, including "target_date" and
"model_id" columns. As all model_output in
model_output_data will be plotted, any filtering needs to happen outside
this function.
truth_data: a data.frame object
containing the ground truth data, including the columns:
"time_idx" and "value".
The projection_data and truth_data contain
information for multiple locations, and scenarios.
To plot the model projections for the US, Scenario A:
# Pre-filtering
projection_data_A_us <- dplyr::filter(projection_data,
scenario_id == "A-2021-03-05",
location == "US")
# Limit time_idx for layout reason
truth_data_us <- dplyr::filter(truth_data, location == "US",
time_idx < min(projection_data$target_date) + 21,
time_idx > "2020-10-01")By default, the 50%, 80% and 95% intervals are plotted, with a
specific color palette per model_id.
In general, it is hard to see multiple intervals when multiple models are plotted, so specifying only one interval can be useful:
It is also possible to add a median line on the plot with the
use_median_as_point parameter:
plot_step_ahead_model_output(projection_data_A_us, truth_data_us,
intervals = 0.8,
use_median_as_point = TRUE)By default plots are interactive, but that can be easily switched to static:
plot_step_ahead_model_output(projection_data_A_us, truth_data_us,
intervals = 0.8,
use_median_as_point = TRUE,
interactive = FALSE)A “facet” (or subplot) plot can also be created for each scenario
# Pre-filtering
projection_data_us <- dplyr::filter(projection_data,
location == "US")
# Limit time_idx for layout reason
truth_data_us <- dplyr::filter(truth_data, location == "US",
time_idx < min(projection_data$target_date) + 21,
time_idx > "2020-10-01")The layout of the “facets” can be adjusted, with the different
facet_ parameters.
plot_step_ahead_model_output(projection_data_us, truth_data_us,
use_median_as_point = TRUE,
facet = "scenario_id", facet_scales = "free_x",
facet_nrow = 2, facet_title = "bottom left")Or with the additional facet_ncol parameter for the
statics plot
plot_step_ahead_model_output(projection_data_us, truth_data_us,
use_median_as_point = TRUE, interactive = FALSE,
facet = "scenario_id", facet_scales = "free_x",
facet_ncol = 4, facet_title = "bottom left"
)A “facet” (or subplot) plot can also be created for each model. In
this case, the legend will be adapted to return the
model_id value.
The legend can be removed with the parameter
show_legend = FALSE.
By default, the 50%, 80% and 95% intervals are plotted. However, it is possible to also plot the 90% intervals or a subset of these intervals. When plotting 6 or more models, the plot will be reduced to show the widest intervals provided (95% by default).
To illustrate this we will use the projections for only one model
# Pre-filtering
projection_data_mod <- dplyr::filter(projection_data,
location == "US",
model_id == "hub-ensemble")plot_step_ahead_model_output(projection_data_mod, truth_data_us,
use_median_as_point = TRUE, facet = "scenario_id",
facet_nrow = 2, intervals = c(0.5, 0.8, 0.9, 0.95))The opacity of the intervals can be adjusted:
plot_step_ahead_model_output(projection_data_mod, truth_data_us,
use_median_as_point = TRUE, facet = "scenario_id",
facet_nrow = 2, intervals = c(0.5, 0.8, 0.9, 0.95),
fill_transparency = 0.15)Plots without intervals are also possible:
Several other parameters are available to update the plot output. Here is some examples of some parameters.
It is possible to assign a specific color and behavior to a specific
model_id. Typically, this is done to highlight an ensemble,
so the name for these arguments are ens_name and
end_color. The model specified by ens_name
will be the top layer of the resulting plot.
Multiple layout update are possible:
plot_step_ahead_model_output(projection_data_A_us, truth_data_us,
title = "Incident Cases in the US")Change palette color and behavior:
The fill_by parameter can be change to another valid
column names to change the legend and colors attributes to this new
column.
plot_step_ahead_model_output(projection_data_us, truth_data_us,
facet = "model_id", fill_by = "scenario_id")It is possible to use only blues for all models, by setting the
pal_color parameter to NULL. This might be
especially useful when used for many models in conjunction with
highlighting the ensemble forecast using the ens_name and
ens_color argument.
plot_step_ahead_model_output(projection_data_A_us, truth_data_us,
intervals = 0.8,
ens_name = "hub-ensemble", ens_color = "black",
pal_color = NULL, use_median_as_point = TRUE)The default blue color can be changed with the one_color
parameter
plot_step_ahead_model_output(projection_data_A_us, truth_data_us,
intervals = 0.8, one_color = "orange",
ens_name = "hub-ensemble", ens_color = "black",
pal_color = NULL, use_median_as_point = TRUE)The input data frames can have different column names for the date
information. In this case, the two x_col_name and
x_truth_col_name parameters can be used to indicate the
variables that should be mapped to the x-axis.
names(truth_data_us)[names(truth_data_us) == "time_idx"] <- "time"
names(projection_data_A_us)[names(
projection_data_A_us) == "target_date"] <- "date"
plot_step_ahead_model_output(projection_data_A_us, truth_data_us,
x_col_name = "date", x_truth_col_name = "time")For other parameters, please consult the documentation associated
with the function:
?plot_step_ahead_model_output